Clustering properties of ultrahigh energy cosmic rays and the search for their 

astrophysical sources 
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The arrival directions of ultrahigh energy cosmic rays (UHECRs) may show anisotropics on 
all scales, from just above the experimental angular resolution up to medium scales and dipole 
anisotropics. We find that a global comparison of the two-point auto-correlation function of the 
data with the one of catalogues of potential sources is a powerful diagnostic tool. In particular, 
this method is far less sensitive to unknown deflections in magnetic fields than cross-correlation 
studies while keeping a strong discrimination power among source candidates. We illustrate these 
advantages by considering ordinary galaxies, gamma ray bursts and active galactic nuclei as possible 
sources. Already the sparse publicly available data suggest that the sources of UHECRs may be 
a strongly clustered sub-sample of galaxies or of active galactic nuclei. We present forecasts for 
various cases of source distributions which can be checked soon by the Pierre Auger Observatory. 



PACS numbers: 98.70.Sa, 98.54.Cm 



I. INTRODUCTION 

The identification of the sources of ultrahigh energy 
cosmic rays (UHECRs) and, more generally, the question 
if astronomy with charged particles is possible are two 
important unresolved problems of astroparticle physics. 
The answer to the latter question depends both on the 
magnitude of deflections in magnetic fields (which in turn 
depends also on the chemical composition of UHECR pri- 
maries) and on the number density and the luminosity of 
UHECRs sources. Consensus has not yet emerged on the 
origin and the amplification mechanisms of primordial 
magnetic fields, nor on the present magnitude and struc- 
ture of extragalactic magnetic fields outside of galaxy 
cluster cores JJ, HJ . Uncertainties from modeling strong 
interactions prevent a clean determination of the fraction 
of heavy nuclei in UHECRs above E > 10 18 eV @. Thus 
theoretical predictions about the chances of charged par- 
ticle astronomy differ drastically, and the answer has to 
come from experiment. 

There are various pieces of evidence in the available 
experimental data. The AGASA data contain several 
small-scale clusters, i.e. clusters of events within its ex- 
perimental angular resolution Q ■ This result triggered a 
series of works studying the auto-correlation of UHECR 
data at small angular scales 15, 6] or correlating UHECRs 
with potential astrophysical sources Q. For instance, the 
best-fit value for the density n s of UHECR sources found 
in Ref. [f| is (1-3) x 10" 5 /Mpc 3 , while the 2er confidence 
region ranges from 2 x 10 _6 /Mpc 3 to ~ 10 _2 /Mpc 3 . i.e. 
up to the density of galaxies. The large statistical er- 
ror of this estimate comes mainly from the small num- 
ber of doublets with less than 3-5 degrees separation, 
while deflections in magnetic fields of more than a few 
degrees would result in a systematic overestimation of 
n s . Correlation analyses of the small UHECR data set 
have their own problems: In order to avoid a too large 
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number of potential sources per angular search bin, one 
has to choose either a very high energy cut or a very spe- 
cific test sample, e.g. a small subset of all Active Galactic 
Nuclei (AGN). Although some studies found significant 
correlations, in particular with BL Lacs, these results 
have remained controversial. 

A second piece of evidence are anisotropics on medium 
scales. The authors of Ref. Q analyzed the available 
data set of CR arrival directions from the HiRes stereo, 
AGASA, Yakutsk and SUGAR experiments with ener- 
gies E > 4 x 10 19 eV in the HiRes energy scale. They 
found evidence at ~ 3<r C.L. for anisotropies on the scales 
of 10-35 degrees, with a clear minimum of the chance 
probability around 20-30 degrees. This result is consis- 
tent with the theoretical expectations for anisotropies as- 
sociated with large-scale structures (LSS) from Ref. 
Further studies showed that the correlations are best ex- 
plained if UHECR sources are either over-biased with 
respect to normal galaxies and/or if the cosmic ray hori- 
zon is smaller than expected for rectilinearly propagating 
protons 

Intriguingly, similar findings seem to emerge from an 
analysis of the preliminary data from the Pierre Auger 
Observatory (PAO). For 64 events with E> Ax 10 19 eV 
the data presented in Ref. [ll[ show a surplus of cluster- 
ing in the broad range from 7 to 30 degrees. The distri- 
bution has its minimum at 7 degrees with a second, broad 
minimum between 19-24 degrees and is quite similar to 
the distribution with 57 events in Ref. [8j. Remarkably, 
the PAO data contain also "8 doublets separated by less 
than 7 degrees in the 19 highest energy events" , i.e. with 
E > 5.75 x 10 19 eV [ll|. 

Finally, the last piece of evidence comes from the UHE- 
CRs energy spectrum. In particular, the l ong controversy 
about the continuation of the spectrum 1121 beyond the 
Greisen-Zatsepin-Kuzmin (GZK) cutoff |l3l. 14l| seems to 
be finally solved by the latest data from HiRes 15j and 
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the PAO [16[ that both detect with an high confidence 
level (> 5a) a prominent steepening in the spectrum 
compatible with the GZK attenuation. Complemented 
by the new PAO stringent limit on the fraction of UHE- 
CRs photon primaries [ijj the data are clearly pointing 
toward a "standard" scenario in which the bulk of UHE- 
CRs sources have an astrophysical origin in the nearby 
universe, with more exotic top-down scenarios playing at 
most a sub-dominant role. This evidence makes timely a 
detailed study of possible UHECRs astrophysical sources 
of the kind addressed in the following. 

The main aim of the present work is to compare the 
auto-correlation function of potential UHECR sources 
with these early results of the PAO on UHECR ar- 
rival directions, and to provide forecasts of the cluster- 
ing expected for different classes of sources which can be 
checked shortly by the PAO. We shall also comment on 
how the clustering features of the public available world 
data-set compare with expectations. The timeliness of 
this analysis is due to the fact that, while increasing ev- 
idence is accumulating in favor of an astrophysical ori- 
gin of UHECRs, it is still unclear how accurately one 
can identify the sources of UHECRs and what are the 
best tools to do so. Here we advocate the importance 
of a global comparison, i.e. a comparison on all angular 
scales, of the observed auto-correlation function of ar- 
rival directions with the expectations for different source 
and primary scenarios. At first glance, cross-correlation 
tests with source catalogues might appear as the ideal 
tool to identify the UHECR sources, but if used alone 
they could be insufficient or misleading. First, the angu- 
lar resolution of 0(1°) of UHECR observatories is poor 
for astronomical standards. Additionally, UHECRs are 
plagued by non-negligible magnetic field deflections, ex- 
cept maybe at the highest energies observed and for pro- 
ton primaries. Non-spurious signals in a cross-correlation 
analysis can only be expected if the overall magnetic de- 
flection of UHECRs is below the size of the angular bin 
used. Small-scale auto-correlation studies are less sen- 
sitive to magnetic fields since only the relative deflec- 
tions between pairs of events enter. Yet, since the overall 
spreading induced by the magnetic fields is unknown, an 
auto-correlation analysis limited to the first angular bin 
(whose size is chosen a priori, e.g. motivated by the an- 
gular resolution of the observatory) may be unsuccessful 
and/or have an ambiguous interpretation. By definition, 
however, if UHECR astronomy is possible at all — at least 
in a statistical sense — sufficiently large angular scales in 
the auto-correlation function should reflect the analogous 
properties of the sources. As a more ambitious goal, a 
global analysis also offers the possibility to infer the av- 
erage size of deflections at the chosen energy scale, and 
thus to perform a kind of magnetic field reconstruction. 

For this line of reasoning to be effective, one has to 
show first that the auto-correlation functions of differ- 
ent potential UHECR sources differ significantly and 
thus might be used to identify the UHECR sources. 
This task is addressed in Sec. [Ill where we discuss sev- 
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FIG. 1: Top panel: PSCz galaxies within z — 0.02 color coded 
in black to green according to increasing redshift. Bottom 
panel: AGNs within z — 0.02 color coded in red to yellow 
according to increasing redshift. Both panels are in Galactic 
Coordinates. 



eral astronomical catalogues. For the study of UHECR 
anisotropies, catalogues should cover the largest possible 
fraction of the sky, ideally complete in distance up to red- 
shift z ~ 0.1. We shall see that this is rarely the case at 
present, which forces us to limit most of our quantitative 
comparisons to the sample of UHECRs with the highest 
energies: As a rule of thumb, the higher the UHECR en- 
ergy, the smaller the energy-loss horizon (and the mag- 
netic deflections), and the more reliable the catalogue. 
Turning this into quantitative statements requires how- 
ever some assumptions on the nature of primary particle 
and the absolute energy scale of experiments. In Scc. lIIIl 
we perform a comparison with the PAO results under 
the hypothesis of proton primaries and using two dif- 
ferent assumptions on the energy scale. There, we also 
comment on the the world data-set of available data from 
the HiRes, AGASA, Yakutsk, and SUGAR experiments. 
Finally, we discuss our results and conclude in Sec. IIVI 



II. GALAXY AND AGN CORRELATION 
FUNCTIONS 

A. Astronomical catalogues 

Among the astrophysical objects most often proposed 
as UHECR sources are AGNs in general, specific sub- 
classes like Blazars, Radio or Seyfert galaxies, Gamma 
Ray Bursts (GRB) or young neutron stars (for a review 
see e.g. [l8| ). All these sources follow the LSS of matter, 
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although with different and scale-dependent biases. In 
order to understand how large this bias is for different 
source classes, we examine now the clustering proper- 
ties of normal galaxies (which may host candidates like 
neutron stars) and AGNs in the nearby universe. We 
use the PSCz catalogue [l9| as a sample of the galaxy 
distribution and the 12th edition of the Veron-Cetty & 
Veron (VCV) catalogue [U for the AGNs. We also study 
the clustering properties of several sub-samples, impos- 
ing cuts in absolute magnitude for the galaxy catalogue 
and subdividing the AGN catalogue into Seyfert galaxies 
of type 1 (SI), type 2 (S2) and LINERs (S3), according 
to the classification reported in VCV catalogue itself. In 
Fig. [1] the various kinds of AGNs and the PSCz galax- 
ies within z = 0.02 are shown in Galactic Coordinates, 
color coded according to their redshift. The empty re- 
gion along the Galactic Plane is the so-called avoidance 
region due the presence of the Milky Way and does not 
reflect an intrinsic lack of objects. 

The details of the PSCz catalogue, in particular a de- 
scription of the mask and of the selection function, are 
summarized in Ref. (l9j . The B-band magnitudes re- 
ported in the catalogue are, however, biased and show 
systematic offsets within different regions of the sky 
where the galaxy magnitudes have been taken from dif- 
ferent catalogues with different calibrations. To over- 
come this problem, we match sources in the PSCz cat- 
alogue with sources in the 2MASS extended source cat- 
alogue [2l| . to get accurate magnitudes in the infrared 
(2.15 /iin) K-band. This is done by requiring that a 
PSCz galaxy is inside the 20 mag arcsec~ 2 isophote in 
the K-band of the matching 2MASS galaxy. We find 
that ~ 80% of the galaxies in the PSCz catalogue have 
a counterpart in the 2MASS XSC and discard the oth- 
ers. We then construct various sub-samples of the cat- 
alogue performing cuts in absolute magnitude using the 
distance modulus relation M = m — Slogdj^Mpc — 25 ~ 
to — 43.16 + 5 log/i — 5 log z (where h is the reduced Hub- 
ble parameter and the redshift dependent K-correction, 
negligible for z < 0.03 — 0.04, has not been considered). 
For these sub-samples we also empirically build new se- 
lection functions using a smooth weight function chosen 
in order to reproduce the redshift distribution of the sub- 
sample. In the top panel of Fig. [3] we show the fraction 
f(z) of galaxies from the PSCz catalogue for the sub- 
samples obtained with luminosity cuts M < —24, —24.5 
in redshift bins of width 0.005. In the same panel we 
also show the complete galaxy sample with no cut im- 
posed. The PSCz catalogue is flux complete and from 
the figure it can be seen that the brightest sub-samples 
are essentially also volume complete out to z ~ 0.02. In 
contrast, the full sample shows prominent signatures of 
volume- incompleteness already at very low redshift. 

Differently from the PSCz catalogue, the VCV cata- 
logue is a compilation of observations and is known to 
suffer increasing incompleteness with increasing redshift. 
We assume that at least in the very nearby universe (i.e. 
z < 0.02) the catalogue can be considered fairly com- 
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FIG. 2: Top Panel: The fraction f(z) of galaxies from 
the PSCz catalogue is shown for various luminosity cuts 
Al < —24, —24.5 and with no cut in redshift bins of width 
0.005. Bottom Panel: The fraction f(z) of SI, S2, and S3 
AGNs. The dotted histogram shows the expected behavior 
of a volume-complete catalogue, i.e. with differential fraction 
oc z 2 . The error bars are the Poisson fluctuations from the 
number count. 



plete and we build selection functions analogously to the 
PSCz sub-samples described above. It will later be seen 
that our results are quite robust against this assump- 
tion. Wc find that: (i) within z = 0.02 the AGN sam- 
ple follow closely the matter distribution, as expected 
(see FigO}; (ii) although the overall catalogue is fairly 
complete within this distance, there are some subsets of 
AGNs which suffer from significant incompleteness and 
therefore selection bias. In the bottom panel of Fig.[5]wc 
show the fraction of SI, S2, and S3 galaxies from the VCV 
catalogue in redshift bins out to z = 0.03, together with 
the Poisson fluctuation due to the finiteness of the sample 
(the sample of active galaxies within z = 0.02 includes 
~ 500 AGNs of which - 150 SI, - 200 S2, and - 80 S3). 
The SI and, to a minor extent, the S2 galaxies show a 
behavior which is reasonably close to the expected z 2 in- 
crease of a truly complete sample. Since they constitute 
by far the largest fraction of AGNs in the VCV catalogue, 
the catalogue as a whole can be regarded as at least rea- 
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FIG. 3: The auto-correlation function w($) for galaxies and for differents cuts in magnitude (left) and AGNs and different 
subsamples (right) as function of i? with 1 a error bars. 



sonably complete out to z ~ 0.02. We note however that 
the S3 sub-sample seems the most affected by incomplete- 
ness effects, a point which is further discussed below, 
ft is difficult to estimate the number density of AGNs 
in the VCV catalogue given that, by construction, the 
astrophysical sources are included without any specific 
selection rule. As a rough estimate, assuming volume- 
completeness with 500 AGNs within z = 0.02 gives 
n s ~5x 10 _4 Mpc _3 /i 3 . The same density corresponds 
also to the galaxies brighter than M cut = —24, while 
50 uniformly distributed sources, always within the same 
volume, would give n s ~ 5 x 10 _5 Mpc~ 3 /i 3 . For ordinary 
galaxies the number density depends strongly on the as- 
sumed M cu t and typically ranges from 10 _3 Mpc _3 /i 3 to 
values as high as 10 _1 Mpc _3 /i 3 for Milky Way-like galax- 
ies. In any case, the value of the number density of the 
UHECR sources depends on the horizon containing the 
sources and thus on issues like the nature of the primaries 
and the absolute energy scale. Thus, in the following we 
will focus the attention mainly on the absolute number 
of sources (above the assumed CR energy threshold) and 
to their bias/overdensity with respect the distribution of 
matter, as principal observables. 

BL Lacs AGNs, also popular UHECR source candi- 
dates, are very rare in our GZK neighborhoods. The 
nearest confirmed BL Lacs in the VCV catalogue is the 
object RXS J05055+0416 at z = 0.027. Including also 
possible BL Lacs only 6 objects are found in the VCV 
within z = 0.03. If indeed such a small number of sources 
would be responsible for the UHECRs even more pecu- 
liar clustering signatures should be expected as a large 
number of triplets or even quadruplets, as long as deflec- 
tions by extragalactic magnetic fields are not too large. 
At the same time, a cross correlation between these ob- 
jects and the UHECR multiplets should become evident. 
This possibility could then be easily confirmed/disproved 



looking at this kind of signatures and we will not discuss 
it further in the following. 

Finally, we briefly discuss the case of GRBs as UHECR 
sources. The observed rate of GRB is i? bs ~ 0.5 x 
10~ 9 /(Mpc 3 yr) according to Ref. [2^|. However, deflec- 
tions in the extragalactic magnetic fields (EGMF) lead 
to time delays r that in turn increase the effective den- 
sity of GRBs as n s = ri? b s ■ The clustering properties of 
GRBs are in general quite different from those of AGNs 
and massive galaxies. Long duration GRBs which make 
up about 2/3 of all GRBs are associated with supernova 
events in extremely massive stars and therefore their dis- 
tribution essentially follows the star formation rate. Star 
forming galaxies arc mainly spirals and irregulars which 
are less clustered than average galaxies in the PSCz cat- 
alogue. The remaining 1/3 of the GRBs which are most 
likely the result of binary collisions have a distribution 
which is close to the one of SN la's, but are not consid- 
ered very likely sites for the UHECR acceleration. Thus 
GRBs cluster less than average PSCz galaxies and, in 
the following considerations, we shall use randomly dis- 
tributed sources as a rough template for their clustering 
properties. 

B. Correlation functions 

An important point to prove for the following argu- 
ments is that different astrophysical catalogues of candi- 
date UHECR sources have sufficiently different clustering 
properties. To that purpose, we calculate in this section 
the auto-correlation function iy(i?) of the various sam- 
ples. In the past, a commonly employed estimator for 
the auto-correlation has been the intuitive DD /RR — 1 . 
This estimator is however sub-optimal especially for the 
estimation of variance [2~i| , while an optimal estimator is 
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liven by [23|,l2J]: 



- 2DR + RR 



(1) 



where D denotes the data-set and R a randomly gener- 
ated data-set with the same bias characteristics as the 
data (same mask, same selection function, same expo- 
sure, etc.), while the quantities DD, DR, RR are the 
normalized pair counts in each angular bin around i9. The 
brackets indicate that the final is an ensemble av- 

erage over many random realizations. Note that for data 
consistent with a random distribution w('d) is zero within 
the errors. The resulting auto-corrclation functions 
are shown in the left panel of Fig. [3] for galaxies and in 
the right panel for AGNs, without weights for the sources 
(i.e. without selection effects and attenuation) and using 
the same masking for all sets in order to have an unbiased 
comparison. The errors in each bin can be estimated as 

HI 



rmsKtf)] = [1 + w(tf)] 



1 



"I 1/2 



n(n - l)/2 (RR) 



(2) 



where n is the number of points in the data set D, and 
hence n(n — l)/2 the total number of unique pairs. 

Both samples show a strong auto-correlation at small 
scales, although the clustering of normal galaxies is quite 
less pronounced (u)AGN(l°)/w ga i(l ) ~ 3). We will see 
that in the relation to the small scale clustering seen by 
the PAO, this difference already tightly constrains the 
possible contribution of normal galaxies as sources of the 
highest energy CRs. The situation changes when bright 
sub-samples of the PSCz galaxies are considered whose 
clustering properties more nearly resemble those of the 
AGNs. This is not surprising given that most of the 
brightest galaxies are in fact AGNs, and the two samples 
thus are not truly independent. Regarding the AGNs 
sub-samples it can be seen that the clustering of SI ob- 
jects shows no strong differences to the one of all AGNs; 
by contrast the S2 and S3 subtypes show a stronger auto- 
correlation on the smallest scales, $ < 3°. Note, however, 
that the AGN samples, having a smaller number of ob- 
jects, have in general also larger error bars; in this case, 
since Poisson statistics makes the errors on w{d) decrease 
for increasing intermediate scales i9 ~ 10-30° might 
be optimal to distinguish between different sources. This 
is especially true for UHECRs when the statistics is very 
limited and/or the smearing at the smallest scale by mag- 
netic fields are important. 

The above results are in general quite in good agree- 
ment with other more detailed studies of the AGNs bias 
properties. In particular, the clustering properties of 
AGNs have been studied extensively for example in Kauf- 
mann et al. [25[ using the SDSS catalogue. Their findings 
are that AGNs are far more common in massive galaxies 
and that the AGN correlation function resembles that of 
massive early-type galaxies, which is similar to what we 
find for the low-redshift VCV sample. 



III. COMPARISON WITH THE DATA AND 
FORECAST FOR AUGER 

We turn now to study the clustering of the various 
source samples considered at rather small scales, com- 
paring them with the existing observations. In particu- 
lar, we shall focus on one of the most remarkable findings 
reported by the PAO, namely that of "8 doublets within 
7 degrees out of the 19 highest energy events" (Tl| . 

Because of the limited UHECR statistics available, and 
to have a direct comparison with the Auger findings, 
in this section we shall use as main observable C($) a 
slightly modified version of the function wifi) of the pre- 
vious section, defined as: 



N 



(3) 



=2 j=l 



i.e. the cumulative number of pairs within the angular 
distance where O is the step function (with 0(0) = 1), 
N the number of CRs considered, and dij is the angu- 
lar distance between events i and j. Although C($) in- 
troduces further correlations between different angular 
bins, the use of cumulative countings instead of differen- 
tial ones has the great advantage of significantly reducing 
the dependence from unknown magnetic field deflections, 
a crucial point for UHECRs astronomy. The ensemble av- 
erage is performed over a large number M = 10 5 of Monte 
Carlo sets. The events are extracted randomly from the 
catalogue under consideration and we take into account 
the PAO exposure as described in Ref. [2(| assuming as 
characteristic parameters for the PAO Cmax = 60° as 
maximal zenith angle for a CR event and <5pao — ~35° 
for the PAO latitude location. The selection effects of the 
catalogue and the attenuation due to CR propagation are 
included assigning proper weights for the sources, which 
in turn are used as emission probabilities in the simula- 
tion. Here a hypothesis on the nature of the particles 
and on the overall energy scale enters. To illustrate this 
point, in Table [J we report the distance D 1 / 2 from which 
50% of the UHECR flux comes, for different assump- 
tions about energy and chemical composition, assuming 
uniformly distributed sources and rectilinear propagation 
(see e.g. [27|)- Note that the injection spectral index has 
only a minor effect on D 1 / 2 in the energy range consid- 
ered. 

In the following we assume proton primaries and use 
the propagation window function W(z, E cut ) as calcu- 
lated in Ref. @ . Given the importance of the clustering 
signal observed by the PAO, we shall focus mostly on the 
case of N = 19 events. In order to study the sensitiv- 
ity to the assumed energy scale, we shall consider two 
cases: (i) the preliminary calibration of the energy scale 
presented by the PAO is correct, and thus the 19 highest 
energy events have energies above E cut = 5.75 x 10 19 eV; 
(ii) UHE air-shower experiments are affected by an over- 
all uncertainty in their energy calibration, whose nor- 
malization might be obtained however by requiring that 
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6 [deg] 6 [deg] 

FIG. 4: The average C(i9) and its 1 a variation for the case of 19 events and: finite number of sources isotropically distributed 
compared with the continuous limit (top- left panel); Galaxies with different cuts in magnitude (bottom- left panel); AGN sub- 
classes compared with Galaxy distribution (top- right panel); all the previous cases assume a cut in the window function at 
Scut = 8x 10 19 eV. The bottom-right panel is the same of the third one, but for E cut = 5.75 x 10 19 eV. Notice that the error 
regions are highly asymmetric for ■& < 40° (see Figs. [5] -[7]) and the upper la error regions shown in the plots almost coincide 
with the mean. 




FIG. 5: Probability distribution of C(7°), for the case of 19 events, an energy cut of E cut = 8 x 10 19 eV and the case of different 
astrophysical models considered (left panel) and a finite number of uniformly distributed sources (right panel). 



they reproduce correctly spectral features of a model, in our case the dip model (28j . The correction factor 
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6 [deg] 

FIG. 6: As in top panels of Fig. Q] 

found with this method is 1.4 for the PAO energy scale, 
in which case the highest energy events have energies 
above E cut ~ 8 x 10 19 eV. 

For the galaxy samples we use the PSCz selection func- 
tion, while for the AGN samples we adopt the approxi- 
mation ip(z) = z a /z 2 with a ~ 0.4 as the best-fit slope 
to the redshift distribution of the samples (see Fig. [2]). 
Although rather crude, this simple approach can be jus- 
tified by the fact that, especially at relatively small an- 
gular scales, the clustering properties are only slightly 
affected by the exact choice of the selection and propa- 
gation weights. As a check, we verified that even the ex- 
treme choice of neglecting all selection and propagation 
effects does not change appreciably the expected mean 
and distribution of the number of pairs. This approxi- 
mation breaks down when the UHECR horizon becomes 
much larger than the distance up to which a catalogue is 
complete, a point we shall come back to later. 

Our results for the average C(-d) and their 1 a varia- 
tions for the case of 19 events are shown in Fig. [4j From 
top-left to bottom-right, we present the case of a finite 
number of sources uniformly distributed together with 
the continuous limit; AGN sub-classes compared with 



Species 


Scut/10 19 eV 


Di/a/Mpc 


Zl/2 


V 


5.0 


160 


0.037 


V 


6.0 


100 


0.023 


p 


8.0 


40 


0.009 


28 Si 


6.0 


30 


0.007 


56 Fe 


6.0 


80 


0.019 


56 Fe 


8.0 


45 


0.011 



TABLE I: The distance -D1/2 (or equivalently redshift 21/2) 
within which 50% of the UHECR flux above -Ecut comes for 
different assumptions on energy and chemical composition, 
assuming isotropic and uniform sources and rectilinear prop- 
agation. Adapted from plots in [27j . 




6 [deg] 

but for a statistics of 40 events. 



galaxy distributions and an isotropic sky; and galaxies 
with different cuts in magnitude. While the first three 
panels assume E cut = 8 x 10 19 eV, the bottom right 
panel is the same as the second one, but for E cut = 
5.75 x 10 19 eV. We note that the strong clustering ob- 
served by the PAO is quite exceptional, and both a uni- 
form random distribution (corresponding to the limit of 
an infinite number of sources) and the galaxy distribution 
predict in general a too small number of pairs within 7° . 
Active galactic nuclei and in particular their sub-samples 
are much more likely to produce the degree of clustering 
observed by the PAO. The same happens for the sub- 
samples of bright galaxies where the brightest galaxies 
(and thus the set with the smallest number density n s ) 
provide the best match to the expected clustering. In 
case of a lower energy scale, the horizon is larger, the sky 
more isotropic, and especially the LSS and the isotropic 
sky hypothesis have even more trouble in explaining the 
observations. Notice however that the S3 sample, which 
seems the AGN sub-sample most consistent with the high 
number of pairs, may suffer from a strong selection bias 
in the VCV catalogue: LINERs are comparatively weak 
AGNs, which are preferentially detected at low z (see 
Fig. 2) . An additional problem is that most LINERs are 
outside the field of view of the PAO, and since their total 
number is much smaller than the one of SI and S2 AGNs, 
cosmic variance plays a significant role (as for any other 
sample made of a small number of objects). Although 
not manifest from the plots of Fig. [4] another caveat is 
that, apart from the isotropic case with an infinite num- 
ber of sources, virtually all models are consistent with 
the observations at the 3 a level. The exact confidence 
levels are illustrated in Fig. where the full distribution 
within 7° for various model are compared to the Auger 
result. Also, we had to concentrate on the largest fluctu- 
ation in the PAO data-set, since this is the only presently 
available information. At different angles, we must ex- 
pect less significant clustering. That said, it is interesting 
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Model ||C-(7°)|C(7°)|C+(7°)||C-(30°)|C(30°)|C+(30°) 
100 GRBs II 10 I 21 I 32 68 82 96 
S2 AGNs 9 18 31 85 105 151 

M-24.5 Gals 6 13 25 81 110 162 



that, as shown in Fig. [6l with a statistics doubled with 
respect to the one analyzed in [ll[ the errors should be- 
come sufficiently small to rule out most cases. 

Clearly the discrimination power between different 
source models would be greatly improved, if the expected 
functions C(i?) were compared to UHECR data not only 
at a single angular scale but on a range of values. Al- 
ready a comparison of the correlations at a second angle 
may be enough to distinguish among different cases. It 
is likely that a global comparison (based e.g. on a x 2 
method or a Kolmogorov-Smirnov test) of the correla- 
tion functions would provide a powerful diagnostic tool. 
This is one of the main results of our work and deserves 
a specific example. For the present purposes, it is suffi- 
cient to illustrate this point by studying the distribution 
of the expected number of pairs within 7° (to stick to 
the most notable finding of the PAO) and 30° (a typical 
intermediate scale) for several models. In particular, in 
table [H] we report the average values C(7°) and C(30°) 
and the 2 a lower and upper limits — denoted by C_ and 
C+ respectively — for the expected number of pairs in dif- 
ferent models and N = 19 data. In table IIIII and ta- 
ble IIVI we report analogous quantites for A^ = 40 and 
A" = 60, respectively. We consider three models: (a) 
100 uniformly distributed sources, which mimicks GRBs; 
(b) the S2 subclass of AGNs; (c) the galaxies brighter 
than M cut = —24.5. These have been chosen to be basi- 
cally consistent with the 7° Auger data. Noticeably, even 
with the 19 events the models (a) and (b) show signif- 
icant differences at 30°, a difference that become quite 
large and easily testable with a modest improvement in 
statistics to 40 events. The latter two models are instead 
almost degenerate from the point of view of clustering 
properties, which does not come as a surprise since the 
two samples have a similar number of objects and most 
of them fall in both subsamples (i.e., they are not inde- 
pendent). Note further that the distributions are gener- 
ally quite non-Gaussian with a prominent tail toward an 
higher number of pairs. 



Model ||C-(7°) \C(7°) \C+(7°) ||C-(30°) |C(30°) |C+(30°) 
100 GRBs 3 8 10 17 19 

S2 AGNs 3 9 12 21 35 

M-24.5 Gals 3 9 13 25 39 



TABLE II: Observables related to the distribution of the 
expected number of pairs within 7° and 30° and N = 19 
events (see text for details on the notation). The different 
models reported are: 100 GRBs, S2 AGNs, Galaxies with 
M cut = -24.5. 

As a final comment, we notice that due to the limited 
information available we have restricted the study to the 
analysis of cumulative number of pairs. However, the 
above approach can be easily generalized to higher or- 
der statistics, like the cumulative counting of the proper 
number of doublets, of triplets, etc. (2{|. A combined use 
of these tools will likely provide even a more robust and 



TABLE III: As in Table HU but for N = 40 events. 



Model 


C_(7°) 


C(7°) 


C+(7°) 


C_(30°) 


C(30°) 


C+(30°) 


100 GRBs 


27 


40 


65 


161 


194 


233 


S2 AGNs 


26 


44 


67 


207 


261 


320 


M_24.5 Gals 


21 


32 


52 


195 


268 


330 



TABLE IV: As in Table HU but for N = 60 events. 



stringent constrain on the nature and number of UHECR 
sources. 



A. Repeaters vs. small scale clustering 

An important issue for the future study of UHECR 
sources is to disentangle the case where an excess of pairs 
should be attributed to multiple events from a single 
point source from the case where the excess is produced 
by the small-scale correlation of two or more sources, as 
discussed in the previous sections for AGNs and bright 
galaxies. 

For a large fraction of the models considered above, the 
predicted clustering is mostly due to the intrinsic correla- 
tions of the sources rather than being caused by multiple 
emissions from single sources 1 . A formal but useful way 
to illustrate this point is to look at the probability dis- 
tribution of C(0°), i.e. the number of pairs for d = 0° 
(repeaters). In Fig. [7] we report this distribution for the 
case of 19 events and an energy cut of E cut = 8 x 10 19 eV. 
We show both the case of a finite number of uniformly 
distributed sources and the case of different astrophysical 
models considered. In the former case, there should be 
less than about 50 UHECR sources within the horizon 
in order to observe a dominant fraction of repeaters as 
origin for the doublets. 

Of course, deflections in magnetic fields and experi- 
mental resolution effects prevent an experimental deter- 
mination of C(0°). As an example, we ask if the strong 
clustering signal in the PAO data coming from pairs 
within 7° is compatible with repeaters. The following ar- 
guments show that, without additional information, both 
the hypotheses of repeaters and of small-scale clustering 
are consistent with the findings. The correlation func- 



As a consequence, we note that a strong intrinsic small scale clus- 
tering within the experimental angular resolution might prevent 
an unambiguous identification of the source of a given event even 
in absence of magnetic deflections. 
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FIG. 7: Probability distribution of C(0°), for the case of 19 
events, an energy cut of E cut = 8 x 10 19 eV and a finite num- 
ber of uniformly distributed sources (top panel) and the case 
of different astrophysical models considered (bottom panel). 
Notice that uniformly distributed sources predicts P < 1CP 3 
for C(0°) = 1. The related curve would thus not be visible in 
the plots. 



reasons for the large separation angle are deflections in 
cxtragalactic magnetic fields, or an intermediate-heavy 
chemical composition of UHECRs that would lead to in- 
creased deflections in the GMF. 

These arguments emphasize once more the importance 
of a global comparison of the auto-correlation function to 
perform a robust diagnostics. 



B. Comparison with old experiments 

Given the importance of a global comparison of the 
auto-correlation function, one may wonder if the already 
existing publicly available data offer additional insight. 
The only sufficiently large data set that is publicly avail- 
able is the one used for the first time in Ref. Q. It 
consists of ~ 100 events with energies E > 4 x 10 19 eV 
from the HiRes stereo [HLHl, AG AS A (33^ Yakutsk H3 
and SUGAR [3{| experiments. Here, we have rescaled 
the absolute energy scale of each experiment [1, [36[ by 
requiring that they reproduce correctly the dip spectral 
feature 12811 - 



Q3 

o 



1000 z 



100 




AGN All 
Do to 
PSCz 
Uniform 



J I I I 



tions iu($) of the astronomical catalogues previously dis- 
cussed arc peaked at small scales, and actually strongly 
peaked below one degree for the AGN samples. Yet, this 
peak in w(-d) will be shifted in the excess signal in C($) 
to larger scales, first of all because the event numbers 
scales approximately as N oc d 2 . Moreover, we expect 
the small-scale signal in the data to be washed away 
partly because of the angular resolution of the detec- 
tor (that in the PAO data-set mentioned is ~ 1°), and 
mostly because of deflections in the Galactic magnetic 
field (GMF) and possibly extragalactic ones. Even pro- 
tons of the considered energy are expected to suffer aver- 
age deflections in the GMF of order ~ 3° [3(| • Since the 
Auger exposure peaks near the Galactic Center region, 
which in typical GMF models is associated with larger 
deflections, separations of point-like sources of protons 
up to 7° by the GMF alone cannot be excluded. Other 



FIG. 8: Cumulative number of pairs and 1 a error regions as 
in Figsfi] [|J] for the set of pre- Auger CR data compared with 
AGNs and PSCz galaxies within z = 0.02, and with a uniform 
expectation. 

Unfortunately, even at energies E ~ 5 x 10 19 eV, for the 
case of protons and a rectilinear propagation, sources be- 
yond redshift z ~ 0.04 contribute about half of the flux, 
see Table HI At those distances the catalogues are known 
to be incomplete, and although we correct for the selec- 
tion function, we cannot apply the method previously 
outlined in a reliable, quantitative way. Nonetheless, in 
Fig. [5] we compare for illustrative purposes the function 
C($) computed for this data-set with the correspond- 
ing expectations from uniformly distributed sources and 
for AGNs as well as the PSCz Galaxy Catalogue within 
z < 0.02. 

It can be seen that the auto-correlation of the data 
presents an excess at 10° — 30° with respect to a uniform 
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distribution corresponding to the medium scale cluster- 
ing signal found in Q. Although a quantitative anal- 
ysis would likely provide a poor fit, one instead recog- 
nizes a qualitative similarity in the pattern of the data 
function and that of the samples following the LSS, i.e. 
AGNs and galaxies. (Note, however, that a compari- 
son below 7? ~ 10° is not very meaningful, given the 
poor angular resolution of SUGAR, for example.) Defi- 
nitely, more statistics at higher energy and better quality 
data are needed from UHECR experiments, while deeper 
and more complete large-angle surveys would be welcome 
from the astrophysical community. 

IV. DISCUSSION AND CONCLUSIONS 

We have examined the clustering properties of ordi- 
nary galaxies, GRBs and AGNs with the aim of finding 
characteristic features which may shed light on them as 
possible UHECR sources. Our auto-correlation studies 
have shown that — consistently with what is known from 
the much l arg er SDSS galaxy and AGN catalogues, see 
e.g. [H, [53, [H, [39[ — nearby AGNs exhibit much stronger 
small-scale clustering than average galaxies. The same is 
true for the brightest galaxies in the PSCz catalogue that 
are mainly big ellipticals (plus some starburst galaxies). 
Since many of them do overlap with known AGNs, these 
two samples are not truly separate and the similarities 
in the small-scale clustering of bright galaxies and AGNs 
are not surprising. Unfortunately, both the overdensity 
overlap of physically different classes of sources and the 
pronounced small-scale clustering of many source candi- 
dates does not play in favor of a clear source identification 
of UHECR sources e.g. by cross-correlation analyses. 

We have argued that the auto-correlation function of 
different source classes differs considerably on all scales 
and may be used as a tool to identify the sources of UHE- 
CRs. Since the PAO has not yet published sufficient in- 
formation on their observed events, we were restricted 
to perform a more conventional analysis considering just 
one bin of the auto-correlation function. At present, the 
most likely interpretation of the evidence reported by 
Auger of "8 doublets separated by less than 7 degrees 
in the 19 highest energy events" is that the sources of 



UHECRs are cither a strongly clustered sub-sample of 
AGNs, or a sparse population of more or less isotropi- 
cally distributed sources (e.g. GRBs), possibly with pairs 
of events within 7° coming from the same objects. From 
our results, it is however clear that a comparison on all 
angular scales would disentangle the two cases. In princi- 
ple, once the source population giving rise to UHECRs is 
identified, the magnetic field deflection required to smear- 
out the original auto-correlation function might be fitted 
and used for studies of the (extra-) galactic magnetic 
field. 

With a statistics as low as twice the preliminary sample 
analyzed by the Auger collaboration, we expect that a 
first conclusive discrimination among source populations 
should be possible. Measuring the difference between e.g. 
different subclasses of AGNs as sources of cosmic rays 
appears to be more difficult and requires more complete 
catalogues within the near (z < 0.1) universe and much 
larger statistics. 

Probably, opening the era of UHECR astronomy will 
require a combined advance in many aspects of UHECR 
physics, from reducing the uncertainty on the absolute 
energy scale to robust constraints on the chemical com- 
position of the primaries. At the same time, the field 
would also benefit from advancements in the astrophysics 
of magnetic fields, like constraints on the Galactic mag- 
netic field and refined simulations of cxtragalactic ones. 
No doubt, however, that once born UHECR astronomy 
will pay-off as an unprecedented diagnostic tool for the 
study of the high energy non-thermal universe, as well as 
for measuring otherwise inaccessible extragalactic mag- 
netic fields. 
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